Mutual-learning sequence-level knowledge distillation for automatic speech recognition

نویسندگان

چکیده

Abstract Automatic speech recognition (ASR) is a crucial technology for man-machine interaction. End-to-end models have been studied recently in deep learning ASR. However, these are not suitable the practical application of ASR due to their large model sizes and computation costs. To address this issue, we propose novel mutual-learning sequence-level knowledge distillation framework enjoying distinct student structures Trained mutually simultaneously, each learns only from pre-trained teacher but also its peers, which can improve generalization capability whole network, through making up insufficiency bridging gap between teacher. Extensive experiments on TIMIT LibriSpeech corpuses show that, compared with state-of-the-art methods, proposed method achieves an excellent balance accuracy compression.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sequence-Level Knowledge Distillation

Neural machine translation (NMT) offers a novel alternative formulation of translation that is potentially simpler than statistical approaches. However to reach competitive performance, NMT models need to be exceedingly large. In this paper we consider applying knowledge distillation approaches (Bucila et al., 2006; Hinton et al., 2015) that have proven successful for reducing the size of neura...

متن کامل

Prosodic knowledge sources for automatic speech recognition

In this work, different prosodic knowledge sources are integrated into a state-of-the-art large vocabulary speech recognition system. Prosody manifests itself on different levels in the speech signal: within the words as a change in phone durations and pitch, inbetween the words as a variation in the pause length, and beyond the words, correlating with higher linguistic structures and nonlexica...

متن کامل

Speech production knowledge in automatic speech recognition.

Although much is known about how speech is produced, and research into speech production has resulted in measured articulatory data, feature systems of different kinds, and numerous models, speech production knowledge is almost totally ignored in current mainstream approaches to automatic speech recognition. Representations of speech production allow simple explanations for many phenomena obser...

متن کامل

Sequence Learning and Speech Recognition

This paper describes application possibilities for statistical methods and particulary hidden Markov models in the domain of sequence learning after treating the required basics. Especially the task of natural language processing is treated by elaborating on speech recognition and speech translation. Furthermore a network intrusion detection system to detect complex and coordinated Internet att...

متن کامل

A Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation

Abstract   Recent developments in robotics automation have motivated researchers to improve the efficiency of interactive systems by making a natural man-machine interaction. Since speech is the most popular method of communication, recognizing human emotions from speech signal becomes a challenging research topic known as Speech Emotion Recognition (SER). In this study, we propose a Persian em...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Neurocomputing

سال: 2021

ISSN: ['0925-2312', '1872-8286']

DOI: https://doi.org/10.1016/j.neucom.2020.11.025